A DISCUSSION ON MEAN EXCESS PLOTS 



SOUVIK GHOSH AND SIDNEY RESNICK 



Abstract. A widely used tool in the study of risk, insurance and extreme values is the mean 
excess plot. One use is for validating a generalized Pareto model for the excess distribution. 
This paper investigates some theoretical and practical aspects of the use of the mean excess 
plot. 



1. Introduction 



The distribution of the excess over a threshold u for a random variable X with distribution 
function F is defined as 



(1.1) 



F u {x) = P[X -u< x\X > u] 



This excess distribution is the foundation for peaks over threshold (POT) modeling (Em 
brechts et al. , 1997; Coles, 2001) which fits appropriate distributions to data of excesses. The 



use of peaks over threshold modeling is widespread and applications include: 

• Hydrology: It is critical to model the level of water in a river or sea to avoid flooding. 
The level u could represent the height of a dam, levee or river bank. See |Todorovic 



and Zelenhasic (1970) and Todorovic and Rousselle (1971). 



Actuarial science: Insurance companies set premium levels based on models for large 
losses. Excess of loss insurance pays for losses exceeding a contractually agreed 



amount. See Hogg and Klugman (1984), Embrechts et al. (2005). 



Survival analysis: The POT method is used for modeling lifetimes; see Guess and 



Proschan (1985). 



Environmental science: Public health agencies set standards for pollution levels. Ex- 



ceedances of these standards generate public alerts or corrective measures; see Smith 



(1989) 



Peaks over threshold modeling is based on the generalized Pareto class of distributions 
being appropriate for describing statistical properties of excesses. A random variable X has 
a generalized Pareto distribution (GPD) if it has a cumulative distribution function of the 
form 

L-(l + ^//3)" 1/€ if Ct^O 
L-exp(-x//3) if f = 

where (3 > 0, and x > when £ > and < x < — (3/£ if £ < 0. The parameters £ and (3 
are referred to as the shape and scale parameters respectively. For a Pareto distribution, the 



(1.2) 
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tail index a is just the reciprocal of £ when £ > 0. A special case is when £ = and in this 
case the GPD is the same as the exponential distribution with mean (3. 

Theorem 7.20, page 



The Pickands-Balkema-de Haan Theorem (Embrechts et al., 2005 



277) provides the theoretical justification for the centrality of the GPD class of distributions 
for peaks over threshold modeling. This result shows that for a large class of distributions 
(those distributions in a maximal domain of attraction of the extreme value laws), the ex- 



cess distribution F u is asymptotically equivalent to a GPD law G { 



as the threshold u 



appropaches the right endpoint of the distribution F. Here the asymptotic shape parameter 
£ is fixed but the scale f}(u) may depend on u. More precise statements are given below in 
Theorems 13.11 13.61 and 13.91 



For this reason the GPD is a natural candidate for modeling 
peaks over a threshold. 

The choice of the extreme threshold u, where the GPD model provides a suitable ap- 
proximation to the excess distribution F u is critical in applications. The mean excess (ME) 
function is a popular tool used to aide this choice of u and also to determine the adequacy 
of the GPD model in practice. The ME function of a random variable X is defined as: 

(1.3) M(u) :=E[X-u\X>u], 

provided EX+ < oc, and is also known as the mean residual life function, especially in 



survival analysis. It has been studied as early as Benktander and Segerdahl (1960). See Hall 



and Wellner (1981) for a discussion of properties of mean excess functions. Table 3.4.7 in 



(Embrechts et al. , 1997, p. 161) gives the mean excess function for some standard distributions. 

Given an independent and identically distributed (iid) sample Xi,...,X n from F(x), a 
natural estimate of M{u) is the empirical ME function M(u) defined as 

U = l(Xi-u)Ir 



(1.4) 



M(u) 



l [Xi>u} 



u > 0. 



Yang ( |1978 ) suggested the use of the empirical ME function and established the uniform 
strong consistency of M(u) over compact u-sets; that is, for any b > 



(1.5) 



P 



lim sup \M{u) - M{u) \ = 

n ^°°0<u<b 



In the context of extremes, however, ( |1.5| ) is not especially informative since what is of 
interest is the behavior of M(u) in a neighborhood of the right end point of F, which could 
be oc. In this case the GPD plays a pivotal role. For a random variable X ~ G^p , we have 
E{X) < oc iff £ < 1 and in this case, the ME function of X is linear in u: 



(1.6) 



M(u) 



i-e 

< u < 



+ 



where 0<w<ooif0<£<l and 
the mean excess function characterizes the GPD class. See Embrechts et al. (|2005l 19971) 



i-r 

-m if e < o. 



In fact, the linearity of 



Davison and Smith (1990) used this property to devise a simple graphical check that data 



conforms to a GPD model; their method is based on the ME plot which is the plot of the 
points M(X(fc))) : 1 < k < n}, where X^ > X^ 2 ) X^ are the order statistics 

of the data. If the ME plot is close to linear for high values of the threshold then there is 



no evidence against use of a GPD model. See also Embrechts et al. (1997) and Hogg and 
Klugman ( |1984| ) for the implementation of this plot in practice. 
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In this paper we establish the asymptotic behavior of the ME plots for large thresholds. 
We assume F is in the maximal domain of attraction of an extreme value law with shape 
parameter £. When £ < 1, we show that, as expected, for high thresholds the ME plot viewed 
as a random closed set converges in the Fell topology to a straight line. A novel aspect of 
our study is we also consider the ME plot in the case £ > 1, the case where the ME function 
does not exist, and show that the ME plot converges to a random curve. This also holds in 
the more delicate case £ = 1 after suitable rescaling. These results show that the ME plot is 
inconsistent when £ > 1 and emphasizes that knowledge of a finite mean is required. 

It is tempting to argue that consistency of the ME plot M(u) should imply, by a continuity 
argument, the consistency of the estimator of £ obtained from computing the slope of the line 
fit to the ME plot. However, this slope functional is not necessarily continuous as discussed 
in Das and Resnick (2008). So consistency of the slope function requires further work and is 
an ongoing investigation. 

The paper is arranged as follows. In Section [2] we briefly discuss required background on 
convergence of random closed sets and then study the ME plot in Section [3j In Section [I] we 
discuss advantages and disadvantages of the mean excess plot and how this tool compares with 
other techniques of extreme value theory such as the Hill estimator, the Pickands estimator 
and the QQ plot. We illustrate the behavior of the empirical mean excess plot for some 
simulated data sets in Section [5] and in Section [6] we analyze three real data sets obtained 
from different subject areas and also compare different tools. 

2. Background 

2.1. Topology on closed sets of R 2 . Before we start any discussion on whether a mean 
excess plot is a reasonable diagnostic tool we need to understand what it means to talk about 
convergence of plots. So we discuss the topology on a set containing the plots. 

We denote the collection of closed subsets of R 2 by F . We consider a hit and miss topology 
on F called the Fell topology. The Fell topology is generated by the familieslJ 7 ^, K compact} 
and {Fg, G open} where for any set B 

F B = {F eF:FnB = (l)} and F B = {F E F : F n B ^ 0} 

So F B and Fb are collections of closed sets which miss and hit the set £>, respectively. This 
is why such topologies are called hit and miss topologies. In the Fell topology a sequence of 
closed sets {F n } converges to F E F if and only if the following two conditions hold: 

• F hits an open set G implies there exists N > 1 such that for all n > N F n hits G. 

• F misses a compact set K implies there exists N > 1 such that for all n > N, F n 
misses K. 

The Fell topology on the closed sets of R 2 is metrizable and we indicate convergence in this 
topology of a sequence {F n } of closed sets to a limit closed set F by F n —> F. Sometimes, 
rather than work with the topology, it is easier to deal with the following characterization of 
convergence. 

Lemma 2.1. A sequence F n E F converges to F E F in the Fell topology if and only if the 
following two conditions hold: 

(1) For any t E F there exists t n E F n such that t n — >> t. 

(2) If for some subsequence (m n ), t mn E F mn converges, then lim t m E F. 

n—^oo 



4 



S. GHOSH AND S. RESNICK 



See Theorem 1-2-2 in (Matheron, 1975, p. 6) for a proof of this Lemma. Since the topology 



is metrizable, the definition of convergence in probability is obvious. The following result is 
a well-known and helpful characterization for convergence in probability of random variables 



and it holds for random sets as well; see Theorem 6.21 in (Molchanov, 2005, p. 92). 



Lemma 2.2. A sequence of random sets (F n ) in T converges in probability to a random set 
F if and only if for every subsequence {n') of Z + there exists a further subsequence {n n ) of 
(n f ) such that F n n F-a.s. 

We use the following notation: For a real number x and a set i C R n , xA = {xy : y E A}. 



Matheron (1975) and Molchanov (2005) are good references for the theory of random sets. 



2.2. Miscellany. Throughout this paper we will take k := k n to be a sequence increasing to 
infinity such that k n /n — > 0. For a distribution function F(x) we write F(x) — 1 — F{x) for 
the tail and the quantile function is 



A function U : (0, oc) i-^ I 



1 



inf{s : F(s) > 1 - -} 



W \1 — F 
is regularly varying with index p E R 
U{tx) 



(u). 



written U E RV pi if 



lim 

t— >oo 



U(t) 



x > 0. 



We denote the space of nonnegative Radon measures \± on (0, oc] metrized by the vague metric 
by M+(0, oc]. Point measures are written as a function of their points {x^, % = 1, . . . , n} by 



X^=i <W See, for example, (Resnick, 1987, Chapter 3) 



We will use the following notations to denote different classes of functions: For < a < 

b < oc 

(i) D[a,b)\ Right-continuous functions with finite left limits defined on [a, b). 

(ii) Di[a,b): Left-continuous functions with finite right limits defined on [a, b). 

We will assume that these spaces are equipped with the Skorokhod topology and the distance 
function. In some cases we will also consider product spaces of functions and then the topology 
will be the product topology. For example, D^ 2 [l,oo) will denote the class of 2-dimensional 
functions on [l,oc). The classes of functions defined on the sets [a, b] or (a, b] will have the 
obvious notation. 

3. Mean Excess Plots 

As discussed in the introduction, a random variable having G^p distribution with £ < 1 
has a linear ME function given by (1.6) where the slope £ is positive (0 < £ < 1), negative or 
£ = 0. We consider these three cases separately. 

3.1. Positive Slope. In this subsection we concentrate on the case where £ > 0. A finite 
mean for F is guaranteed when £ < 1 and we also investigate what happens when £ > 1. 



The following Theorem is a combination of Theorem 3.3.7 and Theorem 3.4.13(b) in Em- 
brechts et all 019971). 



Theorem 3.1. Assume £ > 0. The following are equivalent for a cumulative distribution 
function F: 

(1) F e RV_ m , i.e., for every t > lim^ = t" 1 /*. 
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(2) F is in the maximal domain of attraction of a Frechet distribution with parameter 
i.e., 

lim F n (c n x) = exp{-x -1 ^} for all x > 0. 



where c n — F^(l 



n 



(3) There exists a positive measurable function f3(u) such that 
(3.1) lim sup \F u (x) - G^f u) {x)\ = 0. 

u ^°°x>u 

Theorem [3^3) is one case of the Pickands-Balkema-de Haan theorem. It guarantees 
the existence of a measurable function f3(u) for which (3.1) holds but does not construct 
this function. How ever, /3(u) can be o btained from Karamata's representation of a regular 
varying function (Bingham et al., 1989), namely if F G RV_i/^ there exists < z < oc such 
that 



F(x) — c(x) exp 



Jz oh 



(*) 



dt 



for all z < x < oc 



where c(x) c > and a(x)/x £ as x oc. An easy computation shows as u — >> oc, 



+ xa(w)) 



=(l + o(l))exp- 



pu-\-xa(u) 



=(l + o(l))exp 



l-L 



pit 

Ju a(ty 

u+x£u(l+o(l)) J. ^ 



This means that if X is a random variable having distribution F then for large u, 

-X-u 



a{u) 



< x 



X > u 



a{t) t 
ition 



and a(u) is a choice for the scale parameter /3(u) in (3.1). Hence we get that (3(u)/u —> £ as 
u —¥ oc by the convergence to types theorem (Resnick, 1987). 

Consider the ME plot for iid random variables having common distribution F which sat- 
isfies F G RV_i/£ for some £ > 0. Since the excess distribution is well approximated by 
the GPD for high thresholds, we expect that for £ < 1, the ME function will look similar 
to that of the GPD for high thresholds and therefore seek evidence of linearity in the plot. 
We first consider the ME plot when < £ < 1 and will discuss the case £ > 1 separately. 
Furthermore, we see that for each n > 1, the mean excess plot, being a finite set of R 2 - valued 
random variables, is measurable and a random closed set. It follows from the definition of 
random sets; see Definition 1.1 in ( |Molchanov , 2005, p. 2). 

3.1.1. Heavy tail with a finite mean; < £ < 1. The scaled and thresholded ME plot con- 
verges to a deterministic line. 

Theorem 3.2. If (X n ,n > 1) are iid observations with distribution F satisfying F G RV_i/^ 
with < £ < 1, then in T 



(3.2) 



Sn := { ( X (i)>M(X (i) )) : i = 2, . . . , k] A S := { (t, : t > l}. 
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Remark 3.3. Roughly, this result implies 



X(k)S n := |(X W ,M(X W )) : i = 2, . . . , fc n j 



The plot of the points <S n is a little different from the original ME plot. In practice, people 
plot of the points {(X(^), M(Xu\) : 1 < i < n} but our result restricts attention to the higher 
order statistics corresponding to -X"(i), • • • , x {k)- This restriction is natural and corresponds to 
looking at observations over high thresholds. One imagines zooming into the area of interest 
in the complete ME plot. 

This result scales the points (X^, M(X^)) by X^j}. Since both co-ordinates of the points 
in the plot are scaled, we do not change the structure or appearance of the plot but only the 
scale of the axes. Hence we may still estimate the slope of the line if we want to estimate 



£ by this method (Davison and Smith, 1990). The scaling is important because the points 



{(X(j\),M(Xu\) : 1 < i < k} are moving to infinity and the Fell topology is not equipped 
to handle sets which are moving out to infinity. Furthermore, the regular variation assump- 
tion on the tail of F involves a ratio condition and thus it is natural that the random set 
convergence uses scaling. 



A central assumption in Theorem 3.2 is that the random variables {Xi} are iid. The 
proof of the theorem below will explain that an important tool is the convergence of the 



tail empirical measure v n in (3.4). By Proposition 2.1 in Resnick and Starical (1998), we 



know that the iid assumption of the random variables is not a necessary condition for the 
convergence of the tail empirical measure. We believe as long as the tail empirical measure 
converges, our result should hold. 

Proof. We show that for every subsequence m n of integers there exists a further subsequence 
l n of m n such that 

(3.3) S ln -+ S a.s. 

Define the tail empirical measure as a random element of M + (0, oo] by 



(3.4) 



1 71 



/ X (k) ' 



Following (4.21) in QResnick[ [20071 P- 83 ) we S et that 
(3.5) i>n^v in M + (0, oo] 

where v(x, oo] = x -1 /^, x > 0. Now consider 

' X {\ku\) M ( X (\ku])) s 



S n (u) = 



X (k) '' X (k) 



u e (0, l] 



as random elements in Df(0, l].We will show that S n (-) —> S(-) in Df(0, 1], where 

S(u) = (u~^ z^u~t) for all < u < 1. 



We already know the result for the first component of S n , i.e., Sn\t) := X^ kt ^/X^ —> 
in A(0, 1]; see (Resnick, 2007, p. 82). Since the limits are non-random it suffices to prove the 
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convergence of the second component of S n . Observe that the empirical mean excess function 
can be obtained from the tail empirical measure: 



M(X ([ku]) ) 



k 



-f 



L W \ ku \ ~ 1 Jx (lkul) /x (k) 

Consider the maps T and Tk from M+(0, oo] to Di[l, oo) defined by 



v n {x, oo]dx. 



/oo r l\ vt 

li(x,oo\dx and T^(/i)(t) = / oo]<ix. 

We understand T(/i)(t) = oo if oo] is not integrable. We will show that T{v n ) T(y). 

P 

The function is obviously continuous and therefore Tx(i> n ) — >► Tk{v) in £>/[l, oo). In order 
to prove that T(i> n ) T{v) it suffices to show that for any e > 



lim lim sup P 



\Tk{K) - T{p n )\\ > e 



0, 



where || • || is the supnorm on D/[l, oo). To verify this claim, note that 



lim lim sup P 



\T K (0 n )-T(i> n )\\ >e 



< lim lim sup P 



D n (x, oo]dx > e 



K 



and the rest is proved easily following the arguments used in the step 3 of the proof of 
Theorem 4.2 in flResnick[ [2007] p. 81). 

Suppose l}^>i(0, 1] is the subspace of £^(0, 1] consisting only of functions which are never 
less than 1. Consider the random element Y n in the space D/ 5 >i(0, 1] x oo), 



(k) 



T{p n 



From what we have obtained so far it is easy to check that Y n — >> Y, where 



Y(u,t) = 

The map f : A,>i(0, 1] xA[l,oo)-> D t (0, 1] defined by 

f(f,g)(u)=g(f(u)) forall0<«<l 
is continuous if g and / are continuous and therefore 

oo 



r(r„)H = 



v n (x, oo\dx 



1 



in A (0,1]. 



This finally shows the convergence of the second component of S n and hence we get that 

S n ^S. 

Next we have to convert this result to that of convergence of the random sets S n . This 

argument is similar to the one used to prove Lemma 2.1.3 in Das and Resnick| ( |2008| ) . Choose 

p Pi 
any subsequence (m n ) of integers. Since S n (-) — > S(-) we have S mn (-) — > S(-) in Df(0, 1]. 

So there exists a subsequence (l n ) of (m n ) such that Si n (-) —> S(-) a.s. Now the final step is 

to use this to prove (|3.3[) and for that we will use Lemma |2.1| . Take any point in S of the 
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form (£, £/(l — £)t) for some t > 1. Set u — t~^ and observe that Si n (u) — (£, £/(l — 
and Si n (u) E <S/ n . This proves condition (1) of Lemma 2.1 and we next prove condition 
(2). Suppose for some subsequence (j n ) of (Z n ), Sj n (u n ) converges to (x,y). Since S^ 1 ^?/) is 
strictly monotone we get that the must be some < ^ < 1 such that u n — )> ^ as n — >► oo. 
Now, since Sj n — > *S and 5 is a continuous we get that Sj n (u n ) — » £(?/) E 5. That completes 
the proof. □ 

3.1.2. Case £ > 1; Zzmzt sets are random. The following theorem describes the asymptotic 
behavior of the ME plot when £ > 1. Reminder: £ > 1 guarantees an infinite mean. 

Theorem 3.4. Assume (X n ,n > 1) are z.i.d. observations with distribution F satisfying 
F E RV_ m : 

(i) If£>l, then 



(3.6) 



X(i) M(X (i) ) 



b{n/kY b(n)/k 



: i 



,k 



S 



tS 



: t > 1 



in T , where b(n) :— F^(l — 1/n) and Si/£ is the positive stable random variable with 
index l/£ which satisfies for t G R 



(3.7) 



£?[< 



exp 



{-r(.-i 



(ii) If £ = 1 and k satisfies k = k(n) 



oo ; 



cos-|t| 



1 - i sgn(t) tan ^ j 



(3.8) 



kb(n/k)/b{n) 



k/n 
1 



> 0, and 
(n 



then 



X (i) M(X {i) ) kC n 



OO 



: z 



(3.9) 



b(n/ky b(n/k) ib(n) 

|t^l,5i - 1 -log tj :t> l} 



m J 7 , where 

C n ,k = n(E[XiI Xl <b(n)]\ ~ ^[^l^Xi<6(n/fc)]]) 

and Si a positively skewed stable random variable satisfying 



E[ 



exp 



smx 



1 



x{l + x) 



<ix — \t\ 



7T 

L2 



+ isgn(t)log|t| |. 



Remark 3.5. In Theorem 3.2 we considered the points of the mean excess plot normalized by 



X(h\. By scaling both coordinates by the same normalizing sequence, we did not change the 
structure of the plot. But in Theorem |3.4| (%) we need different scaling in the two coordinates. 
This is simple to observe since b(n) = n _1 ) E RV^ and £ > 1 implies kb{n/k) /b{n) — » 

as n — >• cx). This means that in order to get a finite limit we need to normalize the second 
coordinate by a sequence increasing at a much faster rate than the normalizing sequence 
for the first coordinate. This is indeed changing the structure of the plot and even with 
this normalization the limiting set is random. The limit is a curve scaled in the second 
coordinate by the random quantity Si/f . Note that the limit is independent of the choice of 
the sequence k n as long as it satisfies the condition that k n oo and k n /n as n —> oo. 
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Another interesting outcome, as pointed out by a referee, is that in the log-log scale the limit 
set becomes 

log S = {(u, -u + logSi/g) : ^ > 0} 
which is a straight line with slope l/£ and a random intercept term SW. 



In Theorem |3.4| (^ along with £ = 1 we make the extra assumption (3.8). Under these 
assumptions we get that mean excess plot with some centering in the second coordinate 
converges to a random set. We could obtain result without (3.8) but then the centering 
becomes random and more complicated and difficult to interpret. The significance of (3.8) is 
as follows: The centering C n ^ is of the form 

C n ,k = n{jx{n) - n(n/k)) 

where tt(*) = J n 6(t) F(s)ds is in the de Haan class II and has slowly varying auxiliary function 
g(t) := b(t)/t; see |Resnick| ( |2007[ ) , |Bingham et al.| fl!989| ), [de~Haan| fll976| ) and |de Haan and 
Ferreira| ( |2006 ) . Condition (3.8) is the same as requiring k to satisfy g(n/k)/g(n) — >■ 1. 



Proof, (i) We will first prove that 

X (\k/t]) M(X Ukm) ) 



(3.10) 



Y n (t) :-- 



Y(t) := (tt,tS m ) inL> 2 [l,oo). 



b{n/k) ' b(n)/k 

The two important facts that we will need for the proof are the following: 

(A) [Csorgo and Mason (1986) showed that for any k n — >■ oo satisfying k n /n — >■ 

Si 



1 

b(n 



i=i 



m 



(B) Under the same assumption on the sequence k n (Resnick, 2007, p. 82) 



(3.11) 



X (Wt}) 
b{n/k) 



Y w {t) = tt inL>[l,oo). 



Since is non-rando m, in order to prove Q3.10D it suffices to show that Y^ 2 \t) 



Y^Ht) in D[l,oo) ( |Resnick[ |2007[ Proposition 3.1, p.57). By Theorem 16.7 in flBillingsley 
1999[ p.174) we need to show that Y^\t) =^ Y^ 2 \t) in D[1,N] for every N > 1. So fix 
N > 1 arbitrarily. By an abuse of notation we will use Y and Y n as to denote their restrictions 
on [1, N] as elements of D[l, N]. 

Observe that b(n) E i?V^ and since £ > 1 we get kb(n/k)/b{n) — > as n — )► oc. Combining 
this with (B) we get that for any t > 1, 



(3.12) 



(f*/tl) 



6(n) 



Also observe that for any l<ti<t2<-^ 
1 



(3.13) 



rfc/tii 

n i=[fc/t 2 l+i 



*2 



0. 



+ 1 



6(n) 



0. 
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Using (A), ( |3.12 ), (3.13) and Proposition 3.1 in (Resnick, 2007, p. 57) we get that for any 
1 < h < t 2 < N 

\k/t 2 ~]-i r^/tii-i 

(s m , 0,0,0). 



(\k/t 2 ])^X ([k/tl]) 



( 3 - 14 ) E E *(0> fc * 

V J i=l i=\k/t 2 ~] 

(2) (2) 

This allows us to obtain the weak limit of (Yn (t\),Yn (h))'- 
(Y( 2 \ tl ),YP(t 2 )) 
k 



(M(X {lk/tlV ),M(X mt2]) )) 



b(n 
k 

Hn) \ \k/h] 



\k/h\-i \k/t 2 \- 
1 i=i 1 ' i=i 



x w - x (\k/ t2 -}) 



[k/t 2 ]-l \kfh~\-l 

'k E k £ x » 

+ — fttt^ : fcA(p fc / tl D, 



rfc/tal-l 
i=l 

[fc/tal - 1 



(ffc/tal) 



6(n)^ \k/h]-l \k/t{\-l 

By similar arguments we can also show that for any 1 < t\ < ti < ■ ■ ■ < t m < N, 

(Y^ih), . . . ,Y<?\t m )) =► (h, ■ ■ ■ ,tm)S m . 
From Billingsley (1999), Theorem 13.3, p. 141, the proof of ( 3.10| ) will be complete if we show 



for any e > 
where for any g E D[1,N] 



lim lim P\w N (Y^\5) > el = 0, 



sup {\g(t)-g(ti)\A\g(t2)-g(t)\}. 

l<ti<t<t 2 <AT,t 2 -ti<<5 



Fix any e > and choose n large enough such that Xu.\ > and k/N > 1. Then for any 
1 < h < t 2 < N 



y^(h)-YP( tl )\ 



< 



1 


k 


b{n) 


\k/t 2 ] - 1 


1 


( * 



\k/t2\-l 



\k/t{\-l 



i=l 



\k/t{\ 



i) 



[k/N]-l 



b(n) \ \k/t 2 ] 



+ 



1 



rfc/*2i— i 



< 



i 



it 



i= ffc /AT] 

k 



1) 



ffc/Ar"!— i 



6(n) \ r*:/*2l 



4/c7V 



(fWI) 
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Therefore, using the form of the function U n N we get 



lim lim P\w N (Yj< 2 \5) > el 



< lim lim P 

< lim lim P 



sup 

l<tl<t2<N,*2-tl<<* 

sup 



5-X)n-kx) I l<t 1 <t 2 <N,t 2 -ti<S 

0. 



> e 



lim P 

5^0 



Hence we have proved (3.10). 



Now we prove the statement of the theorem. By Proposition 6.10, page 87 in Molchanov 



(2005) it suffices to show that for any continuous function / : 
support 



with a compact 



lim E 

n— >oo 



sup f(x) 
xeSn 



E 



sup /(a) 



Suppose / : R 2 \-> R + is a continuous function with compact support. By the Skorokhod 
representation theorem (see Theorem 6.7 in ( |Billingsiey| 1 1 999[ p. 70)) there exists a probability 

space (fi, (?,P) and random elements Y*(t) and Y*(t) in D[l,oo) such that Y n = Y* and 

Y = y* and Y*(t)(u) -> Y*(t)(u) in D[l, oo) for every uj e ft. Now observe that 

sup/(:r) = sup /(y*(t)) and sup f(x) ^ sup f(Y*(t)). 

xeS t>i xeS n t>i 

Since / is continuous we get 

sup/(T£(t)) ^sup/(y*(t)) P-a.s. 
t>i t>i 

and since / is bounded we apply the dominated convergence theorem to get 



lim E 

n— >oo 



sup f(x) 
xeS n 



lim E 

n— >oo 



su P /(r n *(t)) 
t>i 



E 



su P /(r*(t)) 

t>l 



E 



sup/O) 



and that completes the proof of the theorem when £ > 1. 

(ii) Similar to the proof of part (i) we will first prove that in D 2 [l, oo) 



(3.15) Y n (t) :-- 



X 



(rw*i) 



M ( X (\k n /t])) k C n:k 



Y(t) :=*(l,5i-l-logi) 



6(n/fc) ' b(n/k) \k/t\ b(n) I 

We will use the following facts: 

(A) |Csorgo and Mason (1986) showed that for any k n — >■ oo satisfying fc n /n — >■ 



1 

b(n) 



. i=l 



a 



n,k 



Si, 



m 



(B) For k -> oo with k/n 0, (3.11) still holds with £ = 1. 
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By the same arguments used in part ( i) it suffices to prove that for any arbitrary N > 1 

YP{t)^Y^{t) mD[l,N}. 



Observe that from (3.11) and the assumption that kb(n/k)/b(n) 1 we get for any t > 1 

k 



(3.16) 



bin) 2^ 6( n /fc)fc ^ W 

v 7 i=fife/tl v 1 } i=\k/t\ 



The reason for this is that 

k 



i 



x ([k/f\)/Kn/k) 



X {k) /b(n/k) 



xv n {dx) 



s: 



logt. 



xx 2 dx = logt 



where v n (dx) = \ Y2i=i ^x (i) /b(n/k)(dx) — >> x 2 dx. See (3.5) and (3.11). Now fix any 1 < t < N 
and note that 



M ( X (\k/t])) kC n)k 



1 



b(n/k) 
( k 



kb(n/k) Uk/t] 
(l + Op(l))l ^ 



\k/t\b{n) 

\k/t\-l \ 

ry E x d) 



i=l 



. i=l 



b(n/k) 

X (\k/t]) 

b{n/k) 



kCr, 



n,k 



\k/t\b(n) 
(l + o(l))t 



6(n) 



6(n) 

=> £5i -t-tlogt. 
We complete the proof using the same arguments as those in part (i). 



i=\k/t] 



□ 



3.2. Negative Slope. The case when £ < is characterized by the following theorem which 
is a combination of Theorems 3.3.12 and 3.4.13(b) in Embrechts et al. (1997): 

Theorem 3.6. If £ < then the following are equivalent for a distribution function F: 

(1) F has a finite right end point xp and F(xp — x _1 ) E RVi/^. 

(2) F is in the maximal domain of attraction of a Weibull distribution with parameter 

i.e., 

F n (x F - c n x) exp{-(-x) -1 ^} for all x < 0, 



where c n — xp — i^~(l 



n 



(3) There exists a measurable function f3(u) such that 

lim sup \F u (x) - G^p( u )(x) \ = 0. 
u^x F u < x <: Xf 

Here we again get a characterization of this class of distributions in terms of the behavior 
of the maxima of iid random variables and the excess distribution. Using Theorem |3.6[ l) and 
Karamata's Theorem ( Bingham et al.[ [l989[ Theorem 1.5.11, p. 28) we get that M{u)/{xp — 
u ) ~ — 1) as ^ — > xp. We show that this behavior is observed empirically. The Pickands- 
Balkema-de Haan Theorem, part (3) of Theorem 3.6, does not explicitly construct the scale 
parameter (3(u) but as in Remark |3.3|one can show that fi{u)/{xp — u) — £ as u —¥ xp. 
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Theorem 3.7. Suppose (X n ,n > 1) are iid random variables with distribution F which has 
a finite right end point x F and satisfies 1 — F(x F — x~ x ) E RVi/e, as x ~~ ^ 00 f or some £ < 0. 
Then 

5„ := {(AT,, - X (i) ,M(X (i)) ) : 1 < i < *} 

-^•S := {(t.^ft- 1)) :0<f<l} 



in T . 



Remark 3.8. As in Subsection |3.1| we look at a modified version of the mean excess plot. Here 
we scale and relocate the points of the plot near the right end point. We may interpret this 
result as 

{(X (i) ,M(X (i) )) :l<i<k}* {X {k) ,0) + {X {1) -X {k) )S 



where S 



1) : < t < 1 



Proof. The proof is similar to that of Theorem 3.2 From Theorem 5.3(h), p. 139 in Resnick 
([20071) we g et 



k 



v inM + [0,oo) 



.1 °[n/k] 

where z/[0, x) — x~ x ^ for all x > and c n = — Following the arguments used in 

the proof of Theorem 4.2 in QResnick[ |2007[ p.81) we also get 



(3.17) 



i n 



I x F~ X (k) 



v inM + [0, oo). 



Here we can represent M(Xn ku i\) in terms of the empirical measure as 



M(x (rM)) - ^ _ i J Q 



^F- X ([ku-}) 
x F~ X (k) 



and taking the same route as in Theorem |3.2| we get 

' X F~ X(\ku~\) M ( X (\ku]))\ p 



S n (u) 



x F - ' x F - X( fc ) 



z> n [0, 



in Df(0, 1]. From this we get in the Fell topology 



{f^,^M):l<K fc }A{(t,-i T t):0<t<l} 
I Vx F - X w ' x F - X (/c) y ~ ~ J IV' £-17 - - J 

Finally, using the fact that 



x F - 



1, 



and the identity 



^-^-{(x (i) -X w ,M(X (i)) ) :l<i<*} 
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xf - 






-X(k) 


xf - 










we get the final result. 



: 1< i < k 



:l<i<k 



(jfe) x f 



□ 



3.3. Zero Slope. The next result follows from Theorems 3.3.26 and 3.4.13(b) in Embrechts 



et al. (1997) and Proposition 1.4 in Resnick (1987). 



Theorem 3.9. The following conditions are equivalent for a distribution function F with 
right end point xp < oo: 

(1) There exists z < xp such that F has a representation 



(3.18) 



Fix) — c(x) exp { — / —r^rdtl, for all z < x < xp, 
I J z a(t) J 



where c(x) is a measurable function satisfying c(x) — >► c > 0, x — » xp, and a(x) is a 
positive, absolutely continuous function with density a!{x) — > as x — » xp. 
(2) F is in the maximal domain of attraction of the Gumbel distribution, i.e., 

F n (c n x + d n ) -> exp { - e~ x ] for all xEl, 



where d n — F^(l 



n 



and c n — a(d n ). 



(3) There exists a measurable function (3(u) such that 

lim sup \F u (x) - G m u )(x) \ = 0. 

U ^ X F U <X<X F 



function a(x) in (3.18) is 



Theorem 3.3.26 in Embrechts et al. (1997) also says that a possible choice of the auxiliary 



a(x) 



J X 



F(x) 



dt for all x < xp, 



and for this choice, the auxiliary function is the ME function, i.e., a(x) = M(x). Furthermore, 
we also know that a'{x) — > as x xp and this implies that M(u)/u — > as u — > xp. 

A prime example in this class is the exponential distribution for which the ME function is 
a constant. The domain of attraction of the Gumbel distribution is a very big class including 
distributions as diverse as the normal and the log-normal. It is indexed by auxiliary functions 
which only need to satisfy a!{x) — >► as x xp. Since M(x) is a choice for the auxiliary 
function a(x), the class of ME functions corresponding to the domain of attraction of the 
Gumbel distribution is very large. 

Theorem 3.10. Suppose (X n ,n > 1) are iid random variables with distribution F which 



satisfies any one of the conditions in Theorem 3.9. Then in T 



(k) 



{ (X {j) - X W , M(X W )) : 1< i < k] A S := { (t, l) : t > o}. 
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Proof. This is again similar to the proof of Theorem 3.2 Using Theorem 3.9 '2) we get 

nF(c n x + d n ) -> e~ x for all xER. 

Since n/k n — >► oc we also get 

n — 

-j- F ( c \n/k]X + d^ n/k -\) -> e~ x for all x E R 
and then Theorem 5.3(h) in flResnick[ [20071 P-139) gives us 



1 n 

:= -y~]5x 4 -dr-iui =^ v in .1 / 4 



2=1 c |~n/fc] 

where v(x, oc) = e _x for all xGi Following the arguments in the proof of Theorem 4.2 in 
flResnick[ [20071 P- 81 ) 

we get 







c \n/k\ 

and then 

1 n 

Z=l C rn/fc] 

Now, one can easily establish the identity between the empirical mean excess function and 
the empirical measure 

kcr n / k i f°° 

M(X(r ku -]\) = — — /, v j) n (x,oo)dx. 

c rn/fc] 

From this fact it follows that 

S n (u) = ( X IW)- X ™ , M(X (lku]) ) \ _^ s{u) = ( _ ^ x) 

in 7Dp(0, 1] and that in turn implies 

{ ( " X ^ , ^ {X ^ ) : 1 < i < k\ A { (M) : < t< ooj 
Finally, using the fact that 



X {\k/2\) - X (k) P k 



In 2 



c \n/k] 

we get the desired result. □ 

4. Comparison with Other Methods of Extreme Value Analysis 
For iid random variables from a distribution in the maximal domain of attraction of the 



Frechet, Weibull or the Gumbel distributions, Theorems |3.2| |3.7| and [3.10| describe the as- 
ymptotic behavior of the ME plot for high thresholds. Linearity of the ME plot for high 
order statistics indicates there is no evidence against the hypothesis that the GPD model is 
a good fit for the thresholded data. 

Furthermore, we obtain a natural estimate £ of £ by fitting a line to the linear part of 
the ME plot using least squares to get a slope estimate 6 and then recovering £ = 6/(1 + 6). 
Although natural, convergence of the ME plot to a linear limit does not guarantee consistency 
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of this estimate £ and this is still under consideration. Proposition 5.1.1 in Das and Resnick 



( 2008 ) explains why the slope of the least squares line is not a continuous functional of finite 



random sets. 



Davison and Smith (1990) give another method to estimate £. They suggest a way to find 



a threshold using the ME plot and then fit a GPD to the points above the threshold using 
maximum likelihood estimation. For both this and the LS method, the ME plot obviously 
plays a central role. We analyze several simulation and real data sets in Sections [5] and [6] 
using only the LS method. 

With any method, an important step is choice of threshold guided by the ME plot so that 
the plot is roughly linear above this threshold. Threshold choice can be challenging and 
parameter estimates can be sensitive to the threshold choice, especially when real data is 
analyzed. 

The ME plot is only one of a suite of widely used tools for extreme value model selection. 
Other techniques are the Hill plot, the Pickands plot, the moment estimator plot and the 



QQ plot; cf. Chapter 4, Resnick (2007) and de Haan and Ferreira (2006). Some comparisons 



from the point of view of asymptotic bias and variance are in de Haan and Peng (1998). 
Here we review definitions and basic facts about several methods assuming that Xi, . . . ,X n 
is an iid sample from a distribution in the maximal domain of attraction of an extreme value 
distribution. The asymptotics require k — k ni the number of upper order statistics used for 
estimation, to be a sequence increasing to oo such that k n /n — >> 0. 

(1) The Hill estimator based on rn upper order statistics is 



X 



(o y 1 



^(ra+l) ' 



1 < m < n. 



If £ > then H kn , n — > a = l/£. The Hill plot is the plot of the points {(fc, H k , n ) ' 1 < 
k < n}. 

(2) The Pickands estimator does not impose any restriction on the range of £. The Pickands 
estimator, 



1 



log 2 



log 



(m) 



x, 



(2m) 



X(2m) ^(4m) 



1 < m < [n/4], 



is consistent for (gK; i.e., £k n ,n — > ^ as n ^ oo. The Pickands plot is the plot of the 

points {(fc,£ m ,n), 1 < m < [n/4]}. 
(3) The QQ plot treats the case £ > and £ < separately. When £ > 0, the QQ plot 
consists of the points Qm,n : = {( — log(i/m), log(X^) /X^ m ))) : 1 < i < m} where m < n 
is a suitably chosen integer. Das and Resnick ([2008) showed Qk n ,n ~^ {(t^t) : t > 0} 
in T equipped with the Fell topology. So the limit is a line with slope £ and the LS 



estimator is consistent (Das and Resnick, 2008 ; [Kratz and Resnick , 1996) 



In the case when £ < then the QQ plot can be defined as the plot of the points 
2m,n := {(^W> ^|~ 1 (V( n + 1))) : 1 — ^ — wnere k i s an estimate of £ based on rn 
upper order statistics. 
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Figure 1. ME plot {(X (i) ,M(X w )), 1 < i < 50000} of 50000 random vari- 
ables from Pareto(2) distribution = 0.5). (a) Entire plot, (b) Order statis- 
tics 250-50000. 




20 40 60 80 120 2 4 6 8 10 12 14 



(4) The moment estimator (Dekkers et al., 1989; de Haan and Ferreira, 2006) is another 



method which works for all £ £ R and is defined as 



-l 



{(moment) _ H (l) , i _ I [ i _ ( H ™,n) \ 



L m,n 



where 

m \ A (m+1) J 

The moment estimator plot is the plot of the points {(k,^™ 171 ™^) : 1 < k < n}. The 
moment estimator is consistent for 
(5) To complete this survey, recall that the ME plot converges to a nonrandom line when 

The Hill and QQ plots work best for £ > and the ME plot requires knowledge that £ < 1. 
Each plot requires the data be properly thresholded. The ME plot requires thresholding but 
also that k be sufficiently large that proper averaging takes place. 

5. Simulation 

We divide this section into three subsections. In subsection 15. II we show simulation results 



for mean excess plot of some standard distributions with well-behaved tails. In subsections [5^2 
and 15.31 we discuss simulation results of some distributions with either difficult tail-behavior 
or infinite mean. 

5.1. Standard Situations. 

5.1.1. Pareto distribution. The obvious first choice for a distribution function to simulate 
from is the GPD. For the GPD the ME plot should be roughly linear. We simulate 50000 
random variables from the Pareto(2) distribution. This means that the parameters of the 
GPD are £ = 0.5 and (3 = 1. Figure [l] shows the mean excess plot for this data set. Observe 
that in Figure[lja) the first part of the plot is quite linear but it is scattered for very high order 
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statistics. The reason behind this phenomenon is that the empirical mean excess function 
for high thresholds is the average of the excesses of a small number of upper order statistics. 
When averaging over few numbers, there is high variability and therefore, this part of the 
plot appears very non-linear and is uninformative. In Figure [TJb) we zoom into the plot 
by leaving out the top 250 points. We calculate using all the data but plot only the points 
{(X(^), M(X({))) : 250 < i < 50000}. This restricted plot looks linear. We fit a least squares 
line to this plot and the estimate of the slope is 0.9701. Since the slope is £/(l — £) we get 
the estimate of £ to be 0.5076. 



Figure 2. ME plot of 50000 random variables from totally right skewed Sta- 
ble(1.5) distribution (£ = 2/3). (a) Entire plot, (b) Order statistics 120-30000, 
(c) 180-20000, (d) 270-10000. 
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5.1.2. Right-skewed stable distribution. We next simulate 50000 random samples from a to- 
tally right skewed stable(1.5) distribution. So F E RV-i.s and then £ = 2/3. Figure [2^a) is 
the ME plot obtained from this data set. This is not a sample from a GPD, but only from a 
distribution in the maximal domain of attraction of a GPD. The ME function is not exactly 
linear and for estimating £ we should concentrate on high thresholds. As we did for the last 
example we drop points in the plot for very high order statistics since they are the average 
of a very few values. Figures [2^b), [2^c) and [2^d) confines the plot to the order statistics 
120-30000, 180-20000 and 270-10000 respectively, i.e., plots the points (X {i) , M(X {i) )) for i 
in the specified range. As we restrict the plot more and more, the plot becomes increasingly 
linear. In Figure [2^d) the least squares estimate of the slope of the line is 1.763 and hence 
the estimate of £ is 0.638. 
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5.1.3. Beta distribution. Figure [3] gives the ME plot for 50000 random variables from the 
beta(2,2) distribution which is in the maximal domain of the Weibull distribution with the 



Figure 3. ME Plot of 50000 random variables from the beta(2,2) distribution 
(£ = -0.5). (a) Entire plot, (b) Order statistics: 150-35000, (c) 300-20000, 
(d) 450-5000. 
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parameter £ = —0.5. Figure [3ja) is the entire ME plot and then Figures |3^b), [3^c) and[3^d) 
plot the empirical ME function for the order statistics 150-35000, 300-20000 and 450-5000 
respectively. The last plot is quite linear and the estimate of £ is —0.5361. 

5.2. Difficult Cases. 

5.2.1. Lognormal distribution. The lognormal(0,l) distribution is in the maximal domain of 
attraction of the Gumbel and hence £ = 0. The ME function of the log normal has the form 

u 

M(u) - (1 + o(l)) as u -> oc; 

mu 



see Table 3.4.7 in (Embrechts et al.[ 1997[ p. 161). So M{u) is regularly varying of index 1 but 



still M'(u) — >> 0. Figure Qa) shows the ME plot obtained for a sample of size 10 5 from the 
lognormal (0,1) distribution. Figures |3^b), Qc) andQd) show the empirical ME functions for 
the order statistics 150-70000, 300-40000 and 450-10000 respectively. The slopes of the least 
squares lines in Figures Qb), [4](c) and Qd) are 0.3351, 0.3112 and 0.267 respectively. The 
estimate of £ also decreases steadily as we zoom in towards the higher order statistics from 
0.251 ingb) to 0.2107 ingd). Furthermore, a curve is evident in the plots and the slope of 
the curve is decreasing, albeit very slowly, as we look at higher and higher thresholds. At a 
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Figure 4. ME plot of 10 5 random variables from the lognormal(0,l) distri- 
bution (f = 0). (a) Entire plot, (b) Order statistics: 150-70000, (c) 300-40000, 
(d) 450-10000. 
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first glance the ME function might seem to resemble that of a distribution in the maximal 
domain of attraction of the Frechet. The curve becomes evident only after a detailed analysis 
of the plot. That is possible because the data are simulated but in practice analysis would 
be difficult. For this example, the ME plot is not a very effective diagnostic for discerning 
the model. 

5.2.2. A non-standard distribution. We also try a non-standard distribution for which F _1 (x) - 
x~ l / 2 (l — 101nx),0 < x < 1. This means that F E RV-2 and therefore £ = 0.5. The exact 
form of F is given by 

(5.1) F(x) = 400W(xe 1/20 /20) 2 x- 2 for all x > 1, 

where W is the Lambert W function satisfying W(x)e w ( x ' = x for all x > 0. Observe that 
W(x) oc as x oc and W(x) < log(x) for x > 1. Furthermore, 

log(x) ^ logW(x) 
W[x) W(x) 

and hence W(x) is a slowly varying function. This is therefore an example where the slowly 
varying term contributes significantly to F. That was not the case in the Pareto or the stable 
examples. We simulated 10 5 random variables from this distribution. Figure [HJ^a) gives the 
entire ME plot from this data set. Figures [5^b) and[5](c) plots the ME function for the order 
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Figure 5. ME Plot of 10 5 random variables from the distribution in ( |5.1[ ). 
(f = 0.5). (a) Entire plot, (b) Order statistics: 150-70000, (c) 400-20000, (d) 
Hill Plot estimating a = l/£. 




10000 30000 50000 500 1000 1500 




T 



200 400 600 800 15 16813 38653 60493 82333 



statistics 150-70000 and 400-20000 respectively. In Figure gc) the estimate of £ is 0.6418 
which is a somewhat disappointing estimate given that the sample size was 10 5 . Figure [5^d) 
is the Hill Plot from this data set using the QRMlib package in R. It plots the estimate of 
a = l/£ obtained by choosing different values of k. It is evident from this that the Hill 
estimator does not perform well here. For none of the values of k is the Hill estimator even 
close to the true value of a which is 2. We conclude, not surprisingly, that a slowly varying 
function increasing to infinity can fool both the ME plot and the Hill plot. See Dege n et al 



(2007) for a discussion on the behavior of the ME plot for a sample simulated from the 
g-and-h distribution and Resnick (2007) for Hill horror plots. 



5.3. Infinite Mean: Pareto with £ = 2. This simulation sheds light on the behavior of 
the ME plot when £ > 1. In this case the ME function does not exist but the empirical 
ME plot does. Figure [6] displays the ME plot of for 50000 random variables simulated from 
Pareto(0.5) distribution. The plot is certainly far from linear even for high order statistics 
and the least squares line has slope 7780.84 which gives an estimate of £ to be 0.9999. This 
certainly gives an indication that the ME plot is not a good diagnostic in this case. 
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Figure 6. ME plot of 50000 random variables from Pareto(0.5) distribution 
(£ = 2). (a) Entire plot, (b) Order statistics 250-10000. 




6. ME Plots for Real Data 

6.1. Size of Internet Response. This data set consists of Internet response sizes corre- 
sponding to user requests. The sizes are thresholded to be at least 100KB. The data set a 
part of a bigger set collected in April 2000 at the University of North Carolina at Chapel 
Hill. 

Figure [7| contains various plots of the data. Figures [7|(b) and Figure [7^e) are the Hill plot 
(estimating l/£) and the Pickands plot respectively. It is difficult to infer anything from 
these plots though superficially they appear stable. Figures [7^c) and[7^d) are the entire ME 
plot and the ME plot restricted for order statistics 300-12500. The second plot does seem 
to be very linear and gives an estimate of £ to be 0.5908. Figures [jjf) andgg) are the QQ 
plots for the data for k = 15000 and k = 2500 (as explained in Section [4]). The estimate of 
£ in these two plots are 0.8851 and 0.6362. The estimates of £ obtained from the QQ plot 
^d) and the ME plot [7^g) are close and the plots are also linear. So we believe that this is 
a reasonable estimate of £. 

6.2. Volume of Water in the Hudson River. We now analyze data on the average 
daily discharge of water (in cubic feet per second) in the Hudson river measured at the U.S. 
Geological Survey site number 01318500 near Hadley, NY. The range of the data is from July 
15, 1921 to December 31, 2008 for a total of 31946 data points. 

Figure [8^ a) is the time series plot of the original data and it shows the presence of period- 
icity in the data. The volume of water is typically much higher in April and May than the 
rest of the year which possibly is due to snow melt. We 'homoscedasticize' the data in the 
following way. We compute the standard deviation of the average discharge of water for every 
day of the year and then divide each data point by the standard deviation corresponding to 
that day. If the original data is say (-X7/15/19215 • • • ,-^12/31/2008) then we transform it to 
(^7/15/1921/^7/15? ' ' ' ^12/31/2008/^12/31), where S7/15 is the standard deviation of the data 
points obtained on July 15 in the different years in the range of the data and similarly S f 12 /3i 
is the same for December 31. The plot of the transformed points is given in[8^b). We then 
fit an AR(33) model to this data using the function ar in the stats package in R. The lag 
was chosen based on the AIC criterion. Figures [8^c) and (d) show the residuals and their 
ACF plot respectively. This encourages us to assume that there no linear dependence in the 
residuals. 
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Figure 7. Internet response sizes, (a) Scatter plot, (b) Hill Plot estimating 
a = (c) ME plot, (d) ME plot for order statistics 300-12500, (e) Pickands 
Plot for f , (f) QQ plot with k = 15000, (g) QQ plot with k = 5000. 
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We now apply the tools for extreme value analysis on the residuals. Figure [8^e ) is the Hill 
plot and it is difficult to draw any inference from this plot in this case. Figures [8j(f ) and[8^g) 
are the entire ME plot and the ME plot restricted to the order statistics 300-1300. From 
8(g) we get an estimate of £ to be 0.261. The Pickands plot in^h) and the QQ plots in 
8(i) and^j) suggest an estimate of £ around 0.4. A definite curve is visible in the QQ plot 
even for k = 600. But the slope of the least squares line fitting the QQ plot supports the 
estimate suggested by the Pickands plot and the ME plot. We see that it is difficult to reach 
a conclusion about the range of £. Still we infer that 0.4 is a reasonable estimate of £ for this 
data since that is being suggested by two different methods. 



6.3. Ozone level in New York City. We also apply the methods to a data set obtained 
from http://www.epa.gov/ttn/airs/aqsdatamart, This is the data on daily maxima of 
level of Ozone (in parts per million) in New York City on measurements closest to the ground 
level observed between January 1, 1980 and December 31, 2008. 

Figure |9^a) is the time series plot of the data. This data set also showed a seasonal 
component which accounted for high values during the summer months. We transform the 
data set to a homoscedastic series (Figure [91(b)) using the same technique as explained in 
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Figure 8. Daily discharge of water in Hudson river, (a) Time series plot, 
(b) Homoscedasticized plot, (c) Residual plot, (d) ACF of residuals, (e) Hill 
plot for a = (f) ME plot, (g) ME plot for order statistics 300-1300, (h) 
Pickands plot, (i) QQ plot with k = 8000, (j) QQ plot with k = 600. 
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Subsection |6.2| Fitting an AR(16) model we get the residuals which are uncorrelated; see 
Figures |9jc) and^d). 
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Figure 9. Ozone level in New York City, (a) Time series plot, (b) Ho- 
moscedasticized plot, (c) Residual plot, (d) ACF of residuals, (e) Hill plot for 
a = (f) ME plot, (g) ME plot for order statistics 300-1300, (h) Pickands 
plot, (i) QQ plot with k = 4000, (j) QQ plot with k = 550. 
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The Hill plot in Figure |9^e) again fails to give a reasonable estimate of the tail index. The 
ME plots in Figures |9](e) and |9jg) are also very rough. Figure [9^g) is the plot of the points 
(X(j\,M{Xi)) for 300 < i < 1300 and the least squares line fitting these points has slope 
0.0472 which gives an estimate of £ to be 0.0451. This is consistent with the Pickands plot 
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in[9](h). This suggests that the residuals may be in the domain of attraction of the Gumbel 
distribution. 

7. Conclusion. 

The ME plot may be used as a diagnostic to aid in tail or quantile estimation for risk 
management and other extreme value problems. However, some problems associated with its 
use certainly exist: 

• One needs to trim away {(X^^M(X^))} for small values of i where too few terms 
are averaged and also trim irrelevent terms for large values of i which are governed 
by either the center of the distribution or the left tail. So two discretionary cuts to 
the data need be made whereas for other diagnostics only one threshold needs to be 
selected. 

• The analyst needs to be convinced £ < 1 since for £ > 1 random sets are the limits 
for the normalized ME plot. Such random limits could create misleading impressions. 
The Pickands and moment estimators place no such restriction on the range of £. The 
QQ method works most easily when £ > but can be extended to all £ E K. The 
Hill method requires £ > 0. 

• Distributions not particularly close to GPD can fool the ME diagnostic. However, 
fairness requires pointing out that this is true of all the procedures in the extreme 
value catalogue. In particular, with heavy tail distributions, if a slowly varying factor 
is attached to a Pareto tail, diagnostics typically perform poorly. 

The standing assumption for the proofs in this paper is that {X n } is iid. We believe most 
of the results on the ME plot hold under the assumption that the underlying sequence {X n } 
is stationary and the tail empirical measure is consistent for the limiting GPD distribution 
of the marginal distribution of X\. We intend to look into this further. Other open issues 
engaging our attention include converses to the consistency of the ME plot and if the slope 
of the least squares line through the ME plot is a consistent estimator. 

We are thankful to the referees and the editors for their valuable and detailed comments. 
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